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Abstract- This paper presents a unique method to embed binary image into the digital video stream. For this we 
made hybridization of three of the most powerful transforms in the domain of image processing namely Discrete 
Cosine Transform (DCT), Discrete Wavelet Transform (DWT) and Singular Value Decomposition (SVD). For the 
sake of evaluation and comparison of the algorithm we used three parameters namely peak signal to noise ratio 
(PSNR), mean square error (MSE) and correlation. 
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I. INTRODUCTION 

Today’s era is the era of heralded connectivity which means that connectivity over internet and connectivity 
through wireless network. We do have some extraordinary inventions like digital camera, camcorders, MP3 players, 
PDA’s etc. for creating, manipulating and enjoying the multimedia data. Nowadays the development of internet has 
given us some valuable gifts like electronic publishing of various files, e -advertising, e-newspaper, e-magazine, 
e-library, online video, online audio, on- line product ordering, online transactions, real time information delivery 
and many more. Because of all these storing, transmitting and distributing digital videos over the internet has been a 
very easy task. However creators of the videos are afraid of transmitting and distributing these valuable videos 
because of the problem of copyright protection. It is very easy task to copy digital data and when it is paste 
somewhere, it looks like the original one and therefore it leads towards the spiteful intent of what is called as piracy. 
The best possible way to protect multimedia data against illegal recording and retransmission is to embed a signal, 
called digital signature or copyright label or watermark into the cover medium that authenticates the owner of the 
data. The method is known as digital watermarking which a state is of art technique to put a secret message behind a 
cover medium in such a manner that the common man cannot visualize the message with a necked eye and he/she 
perceives it as a normal cover medium. Message may be the name of the creator, a logo of the company, or any 
other sign which can be extracted only when some specific algorithm is applied to extract the message and in this 
way the proof of ownership can be given. Nowadays the subject of interest is to provide proof of ownership and to 
prevent unauthorized tempering of the multimedia files. It can easily be done because editing of the files is done 
digitally. And this is the reason why both industry and academic people are working seriously on digital 
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watermarking. Techniques have been proposed for a variety of applications, including ownership protection, 
authentication and access control. 

This research paper describes the digital watermarking where in we took video as a cover medium and grayscale 
image is chosen to be embedded in the video. The major factors that increases the demands of the Video 
watermarking [1,2] are stated as below. 

• Privacy of the digital data is required and because the copying of a video is comparatively very easy. 

• Fighting against the “Intellectual property rights breach” 

• Tempering of the digital video must be concealed. 

• Copyright protection must not be eroded. 



II. VISUAL QUALITY MATRICES 



We have mainly used the following visual quality matrices 
the watermark is added to video. 

M N 

MSE = MVNZZ {(f(x ’ y) ~ f,(x ’ y))A2} 

X— 1 y-1 

255 2 

PSNR = 10 x log— — — 

5 MSE 



[14] for the sake of comparison of degradation after 



(1) 

( 2 ) 



Here MSE - Mean Square Error 
PSNR - Peak Signal to noise Ratio 
f(x,y) - Original Frame of the video 
f (x,y) - Watermarked Frame of the Video. 

The phrase peak signal-to-noise ratio [14], often abbreviated PSNR, is used to measure the similarity between 
two signals where in one is original and the other is altered version of the same. PSNR can be defined via the Mean 
Square Error (MSE) which gives us the idea of difference between the original and the altered signal. PSNR is 
measured in the logarithmic scale and MSE is measured in the general scale. 

At the receiver end we extracted the watermark and measured the correlation [14] of the recovered watermark 
and original watermark for the sake of checking the robustness. 



III. PROPOSED METHOD 



A. Embedding Algorithm 

Stepwise description of the embedding process of the proposed method is mentioned herewith. 

1 . Video is taken and converted into the sequence of frames. 

2. First frame is taken and colorspace conversion is performed from RGB colorspace to the YCbCr colorspace. 
Here some of the reasons for choosing YCbCr colorspace are: 

• When JPEG compression is performed on the frame RGB colorspace is affected more as compare to the 
YCbCr colorspace. Here Y means luminance component of the frame and Cb and Cr are blue -difference 
and red-difference chrominance components. 

• It requires less disk space and less bandwidth. 

• This is the only colorspace used in the SD media, which have lower bandwidth and needs to have 
backwards compatibility. 

Equations 3 to 5 show the conversion formulas from RGB to Y CbCr colorspace while equations 6 to 8 show the 



reverse conversion. 

Y = 16 + 65.481*R + 128.553*G + 24.966*B (3) 

Cb = 128 - 37.797*R - 74.203*G + 1 12*B (4) 

Cr =128+1 12*R - 93.786*G - 18.214*B (5) 

R = Y + 0 * Cb + 1.402 * Cr (6) 
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G = Y - 0.344136 * Cb - 0.714136 * Cr (7) 

B = Y + 1.772 *Cb + 0 *Cr (8) 

3. Y component of the frame is selected for the purpose of watermarking. 

Here Y component is selected for the purpose of watermarking because human visual system is more sensitive 
towards change in the brightness compare to the color. Therefore JPEG compression does not compress any sample 
of the Y component. So if message content is embedded in the Y component it may be better preserved as compare 
to Cb and Cr. Also even if the frame is compressed a normal person perceives same information as the original case. 

4. A two dimensional Discrete Cosine Transform [3, 4, 5, 6] is applied on the Y frame. 

Equations 9 and 10 show the expressions of the 2D DCT and 2D inverse DCT respectively where f(x, y) is a 
spatial domain representation and F(u, v) is a frequency domain representation. Discrete cosine Transform converts a 
spatial domain 2D representation into its frequency domain equivalent. There are some of the observations of what 
appears at the output of DCT. One of the observations is that the number of rows and columns in the transformed 
matrix is exactly equal to that in the original 2D matrix. Another observation is that the upper left comer of the DCT 
transformed matrix contains most of the signal energy because the upper left comer contains the low frequency part 
of the original signal with the left most component is a DC coefficients and all the other signals are AC coefficients. 
Another observation is that going towards the right side in zigzag fashion the frequency increases and the values of 
the coefficients starts decreasing. Another observation is that the DC coefficient is always an integer and the range of 
that would be in between -1024 to 1023 while AC coefficient may be integer or non- integer. 
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5. A three level Discrete Wavelet Transform (DWT) [7, 8, 9 and 10] is applied on the DCT transformed frame. 
Wavelets as compare to waves, which are oscillating functions of time or space and are periodic, are localized 

waves having their energy concentrated in time or space and they are used for the purpose of analyzing a signal. 
Wavelet transform convolves the signals with particular instances of wavelets at various time scales and positions. 
The DWT is based on sub-band coding, is easy to implement, does require limited time and resources and yields fast 
computations of wavelet transform. The DWT is a combined process of filtering and sub-sampling where sub- 
sampling may be up-sampling or down-sampling. Filtering operation determines the resolution, which is a measure 
of the information in the signal, of the signal while sub-sampling operation determines the scale of the signal. The 
multi-level DWT is computed using a successive low pass and high pass filtering. At each decomposition level, 
frequency resolution is doubled as the uncertainty in frequency is reduced by half and time resolution is made half 
means if the signal has originally of 500 samples it reduces to 250 samples at the end of first decomposition level. 
Thus it may be observed that with this approach, time resolution becomes good at high frequencies while frequency 
resolution becomes good at low frequencies. 

6. Singular Value decomposition [15-19] is applied to both DWT transformed frame and the message frame. 
Singular value decomposition (SVD) is a numerical technique based on the linear algebra and it is used to 

diagonalize matrices in numerical analysis. When SVD is applied to an Image A of size MxN, results are three 
matrices, namely U, V and S. Here U and V matrices are called unitary matrices having size MxM and NxN 
respectively and S matrix is called diagonal matrix having size MxN. SVD is able to efficiently represent the 
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intrinsic algebraic properties of an image, where singular values correspond to the brightness of the image and 
singular vectors reflect geometry characteristics of the image. The image A is represented as given in equation 11. 



A = U * S * V T 



( 11 ) 



The columns of the U matrix are called the left singular vectors while the columns of the V matrix are called the 
right singular vectors of A. The diagonal entries of S are called the singular values of A and are arranged in 
decreasing order. The singular values (SVs) of an image have very good stability, i.e., when a small perturbation is 
added to an image, its SVs do not change significantly. An image matrix has many small singular values compared 
with the first singular value. Even ignoring these small singular values in the reconstruction of the image does not 
affect the quality of the reconstructed image. 

7. The singular value of the frame is modified according to the singular values of message. 

8. Inverse SVD is applied to get the watermarked DWT frame. 

9. Inverse three level DWT is performed to get watermarked DCT frame. 

10. Inverse DCT is applied to get the watermarked Y frame. 

1 1 . Inverse colorspace conversion is performed to get watermarked frame. 

12. Next frame is taken and steps 2 to 9 are performed until all the frames are watermarked. 

13. All watermarked frames are combined to get the watermarked video. 

Figure 1 shows the results of the embedding process with the gain factor 100. 



frame: 1 Frame: 2 Frame:::! Frame: 4 F rami: 5 




(a) 



Wat e rrn a rk e d : 1 Wat e rrn a rk e d : 2 Wat e rrn a rk e d : 3 Wat e rm a rk e d : 4 Wat e rrn a rk e d : 5 




(b) 

Figure 1: Example of Invisible Video Watermarking using Hybrid Method with Gain Factor 

message (a) First five frames of original video (b) Watermarked Frames 



100 and binary 



B. Extraction Algorithm 



Stepwise description of the extraction process of the invention is mentioned herewith. 

Watermarked Video, which may possibly be attacked, is taken and converted into the sequence of frames. 

1. First frame is taken and colorspace conversion is performed from RGB colorspace to the YCbCr 
colorspace. 

2. Y component of the frame is selected for the purpose of watermark extraction. 

3. A two dimensional Discrete Cosine Transform is applied on the Y frame. 

4. A three level Discrete Wavelet Transform (DWT) is applied on the DCT transformed frame. 

5. Singular Value decomposition is applied to DWT transformed frame. 

6. Singular values are modified to get the watermark message back. 

Figure 2 shows the result of the recovered message from the video. 
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Figure 2: Recovered Messages 

C. Results 



Figure 3 shows results of the method on frame 1 considering various values of the gain factor. 
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Figure 3: Results with various gain factors (a) Watermarked Frame 1 (b) Recovered Messages 
Table 1 shows the results of the first five frames of the video taking gain factor to be 100. Table 2 shows the 
results of watermarking frame 1 with various gain factors. 



Frame 

No. 


PSNR(dB) 


MSE 


Correlation 


1 


37.0173 


12.9227 


0.9332 
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37.0311 
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0.9316 
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TABLE 1. PSNR Results of hybrid Method with gain factor = 100 and Binary message 
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TABLE 2. Results of hybrid Method on frame 1 using various gain factors with binary message 

Figure 4 shows results of the scheme with gain factor 100 and under various attacks. Here the numbers on the 
upper side of the frames show PSNR and that in the lower side of the frames show variants of particular attacks. 
Similarly the numbers on the upper side of the recovered watermark show correlation values. 
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Figure 4: Results of Watermarked Frame and Recovered Message under various attacks (a) Average Filtering with 
various mask-size (b) Gaussian Law Pass Filtering with various standard deviations (c) Median Filtering with 
various mask-size (d) Compression with various Quality values (e) Color Reduction with various no. of colors (f) 
Histogram Equalization (g) Linear Motion of Camera with various no. of pixels (h) Rotation with various angles (i) 
Gaussian Noise with 0 Mean and various variances (j) Salt & Pepper Noise with various variances (k) Spackle 

Noise with various variances (l) Cropping with various crop region 

IV. CONCLUSION 

Following are some of the observations made after successfully implemented both embedding and extracting 
algorithm. Here a gain factor of 100 is assumed for the sake of observations in terms of perceptibility and robustness. 
Higher is the value of PSNR, higher is the perceptibility and higher is the value of correlation, higher is the 
robustness. 

• Perceptibility in this method decreases with the increase in gain factor. 

• Robustness decreases with the increase in gain factor. 

• The frames looks visibly fine if the resultant PSNR is above 28 dB. The message seems visibly identifiable 
if the resultant correlation is greater than 0.50. 

• This method is fully robust against all kind of attacks. 

Comparison with Spatial and Transform Domain Methods 

• Perceptibility in this method is considerably highest among all methods at the same gain factor. 

• Robustness achieved is considerably higher than Correlation, DCT and DWT based methods and slightly 



16 








A. M. Kothari et al. / International Journal of Signal and Image Processing Issues 



less than SVD based method. 
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